2025.10.09 | Ming-UniVision统一视觉词表;KV-Cache直连让大模型秒聊
Description
本期的 15 篇论文如下:
[00:21 ] 🔄 Ming-UniVision: Joint Image Understanding and Generation with a Unified Continuous Tokenizer(Ming-UniVision:用统一连续视觉词表打通图像理解与生成)
[00:59 ] 🧠 Cache-to-Cache: Direct Semantic Communication Between Large Language Models(缓存到缓存:大模型间的直接语义通信)
[01:32 ] 🌀 Lumina-DiMOO: An Omni Diffusion Large Language Model for Multi-Modal Generation and Understanding(Lumina-DiMOO:面向多模态生成与理解的离散扩散大模型)
[02:07 ] 🧠 SHANKS: Simultaneous Hearing and Thinking for Spoken Language Models(SHANKS:口语模型边听边想的同步推理框架)
[03:06 ] 🤖 RLinf-VLA: A Unified and Efficient Framework for VLA+RL Training(RLinf-VLA:面向VLA模型强化学习训练的统一高效框架)
[04:02 ] 🎬 MATRIX: Mask Track Alignment for Interaction-aware Video Generation(MATRIX:面向交互感知视频生成的掩码轨迹对齐)
[04:51 ] 🎯 Vibe Checker: Aligning Code Evaluation with Human Preference(Vibe Checker:让代码评估对齐人类偏好)
[05:44 ] 🤖 Multi-Agent Tool-Integrated Policy Optimization(多智能体工具集成策略优化)
[06:24 ] 🧠 CALM Before the STORM: Unlocking Native Reasoning for Optimization Modeling(风暴前夜:解锁优化建模原生推理潜能的轻量化矫正框架)
[06:59 ] ✂ OBS-Diff: Accurate Pruning For Diffusion Models in One-Shot(OBS-Diff:一次性精准剪枝扩散模型)
[07:52 ] 🧠 Artificial Hippocampus Networks for Efficient Long-Context Modeling(面向高效长上下文建模的人工海马网络)
[08:30 ] 🔍 Revisiting Long-context Modeling from Context Denoising Perspective(基于上下文降噪视角的长文本建模再审视)
[09:11 ] 🧠 Pushing on Multilingual Reasoning Models with Language-Mixed Chain-of-Thought(推动多语言推理模型:语言混合思维链新范式)
[09:51 ] 💥 Why Low-Precision Transformer Training Fails: An Analysis on Flash Attention(低精度Transformer训练为何失败:Flash Attention失效机理剖析)
[10:37 ] ⚡ Native Hybrid Attention for Efficient Sequence Modeling(原生混合注意力高效序列建模)
<figure>
【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递